formal verification
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Hong Kong (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- (2 more...)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > China > Hong Kong (0.04)
- (13 more...)
BRIDGE: Building Representations In Domain Guided Program Verification
George, Robert Joseph, Eisenach, Carson, Ghai, Udaya, Perrault-Joncas, Dominique, Anandkumar, Anima, Foster, Dean
Large language models (LLMs) have achieved impressive results in code generation, yet struggle with program verification, especially in interactive proof frameworks such as Lean4. A central challenge is scalability: verified synthesis requires not just code, but also precise specifications and correctness proofs, and existing approaches rarely span all three domains. We present BRIDGE, the first systematic study of structured prompting for scalable verified program generation. BRIDGE decomposes verification into three interconnected domains: Code (executable implementations), Specifications (formal intent statements), and Proofs (constructive correctness arguments). Our key idea is to elicit distinct reasoning behaviors functional, specification-driven, and proof-oriented as intermediate representations that preserve semantic structure and connect these domains. Through systematic ablations, we show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods. For example, functional reasoning improves correctness of code in formal languages (Lean4) by nearly 1.5x (pass@5) over direct baselines. In inference-time compute, functional reasoning is also 2x more efficient, achieving higher pass rates with fewer generations and lower total sampling budgets. Similarly, we find that specification-driven prompting boosts Python coding pass rates by up to 17.5%. These findings suggest that structured domain alignment is a promising direction for advancing verified synthesis. BRIDGE establishes a foundation for training via expert iteration or RLVR, enabling models to internalize these reasoning strategies across code, specifications, and proofs.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California (0.04)
Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & Verification
Gadde, Deepak Narayan, Radhakrishna, Keerthan Kopparam, Viswambharan, Vaisakh Naduvodi, Kumar, Aman, Lettnin, Djones, Kunz, Wolfgang, Simon, Sebastian
Modern Integrated Circuits (ICs) are becoming increasingly complex, and so is their development process. Hardware design verification entails a methodical and disciplined approach to the planning, development, execution, and sign-off of functionally correct hardware designs. This tedious process requires significant effort and time to ensure a bug-free tape-out. The field of Natural Language Processing has undergone a significant transformation with the advent of Large Language Models (LLMs). These powerful models, often referred to as Generative AI (GenAI), have revolutionized how machines understand and generate human language, enabling unprecedented advancements in a wide array of applications, including hardware design verification. This paper presents an agentic AI-based approach to hardware design verification, which empowers AI agents, in collaboration with Humain-in-the-Loop (HITL) intervention, to engage in a more dynamic, iterative, and self-reflective process, ultimately performing end-to-end hardware design and verification. This methodology is evaluated on five open-source designs, achieving over 95% coverage with reduced verification time while demonstrating superior performance, adaptability, and configurability.
- Asia > India (0.05)
- South America > Brazil > Amazonas > Manaus (0.04)
- Europe > Germany > Rhineland-Palatinate > Landau (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
AI for software engineering: from probable to provable
Vibe coding, the much-touted use of AI techniques for programming, faces two overwhelming obstacles: the difficulty of specifying goals ("prompt engineering" is a form of requirements engineering, one of the toughest disciplines of software engineering); and the hallucination phenomenon. Programs are only useful if they are correct or very close to correct. The solution? Combine the creativity of artificial intelligence with the rigor of formal specification methods and the power of formal program verification, supported by modern proof tools.
Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification
Ramani, Keshav, Tawosi, Vali, Alamir, Salwa, Borrajo, Daniel
We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.
- Europe > United Kingdom > England > Greater London > London (0.05)
- North America > United States (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
Councilman, Aaron, Fu, David Jiahao, Gupta, Aryan, Wang, Chengxiao, Grove, David, Wang, Yu-Xiong, Adve, Vikram
In the past few years LLMs have emerged as a tool that can aid programmers by taking natural language descriptions and generating code based on it. However, the reliability of LLM code generation and current validation techniques for it are far from strong enough to be used for mission-critical or safety-critical applications. In this work we explore ways to offer formal guarantees of correctness to LLM generated code; such guarantees could improve the quality of general AI Code Assistants and support their use for critical applications. To address this challenge we propose to incorporate a Formal Query Language that can represent a user's intent in a formally defined but natural language-like manner that a user can confirm matches their intent. We then have a formal specification of the user intent which we can use to verify that LLM-generated code matches the user's intent. We implement these ideas in our system, Astrogator, for the Ansible programming language, widely used for system administration, including for critical systems. The system includes an intuitive formal query language, a calculus for representing the behavior of Ansible programs, and a symbolic interpreter and a unification algorithm which together are used for the verification. A key innovation in Astrogator is the use of a Knowledge Base to capture system-specific implementation dependencies that greatly reduce the need for system knowledge in expressing formal queries. On a benchmark suite of 21 code-generation tasks, our verifier is able to verify correct code in 83% of cases and identify incorrect code in 92%.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification
Yubeaton, Patrick, Nakkab, Andre, Xiao, Weihua, Collini, Luca, Karri, Ramesh, Hegde, Chinmay, Garg, Siddharth
This paper introduces VeriThoughts, a novel dataset designed for reasoning-based Verilog code generation. We establish a new benchmark framework grounded in formal verification methods to evaluate the quality and correctness of generated hardware descriptions. Additionally, we present a suite of specialized small-scale models optimized specifically for Verilog generation. Our work addresses the growing need for automated hardware design tools that can produce verifiably correct implementations from high-level specifications, potentially accelerating the hardware development process while maintaining rigorous correctness guarantees. Our code and data are available at \href{https://github.com/wilyub/VeriThoughts}{this URL}.
- North America > United States > New York > Kings County > New York City (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Hong Kong (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- (2 more...)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > China > Hong Kong (0.04)
- (13 more...)